Conversation
| #Load the HF model from config | ||
| config_load = args.hf_config_path | ||
| config = safe_load_config_with_retry(config_load, trust_remote_code=False) | ||
| bridge = AutoBridge.from_hf_config(config) |
There was a problem hiding this comment.
Will this save-ckpt step allocate extra GPU memory when initializing an HF model?
| bridge.load_hf_weights(ddp_model) | ||
| # no optimizer weight | ||
| iteration=0 | ||
| num_floating_point_operations_so_far=0 |
There was a problem hiding this comment.
please add print_rank_0 here
| # use megatron bridge | ||
| from megatron.nemo_bridge.models import AutoBridge | ||
| bridge=AutoBridge.from_hf_pretrained(load_dir) | ||
| bridge.load_hf_weights(ddp_model) |
There was a problem hiding this comment.
Can nemo-bridge’s load_hf_model handle a ddp_model directly, where ddp_model is wrapped by DistributedDataParallel?
| @@ -0,0 +1,8 @@ | |||
| # Copyright (c) 2025, BAAI. All rights reserved. | |||
There was a problem hiding this comment.
nemo megatron-bridge supports pip install for usage, ref https://pypi.org/project/megatron-bridge/
please remove source codes
| @@ -0,0 +1,8 @@ | |||
| # Copyright (c) 2025, BAAI. All rights reserved. | |||
There was a problem hiding this comment.
Rename flagscale/train/megatron/nemo_bridge to flagscale/train/megatron/bridge so that it matches the import pattern from megatron.bridge
tengqm
left a comment
There was a problem hiding this comment.
When copy pasting source code from other repos, we are supposed/obliged to copy paste their copyright notice as well. We cannot claim copyrights for these code.
The original code has following copyright header to be preserved:
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.| @@ -0,0 +1,110 @@ | |||
| # Copyright (c) 2025, BAAI. All rights reserved. | |||
| # | |||
| # Copied from: https://github.com/NVIDIA-NeMo/Megatron-Bridge | |||
There was a problem hiding this comment.
If Megatron-Bridge has a copyright claim, we are supposed to paste their copyright statements here.
…gScale into add_nemo_bridge
|
|
||
| if not has_implementation: | ||
| raise ValueError( | ||
| f"\n�~\~W Model architecture '{architecture}' is not yet supported\n\n" |
There was a problem hiding this comment.
What are these weird characters?
There are some other similar cases in this string.
| @@ -0,0 +1,359 @@ | |||
| # Copyright (c) 2025, BAAI. All rights reserved. | |||
| # | |||
| # Mainly adapted from: https://github.com/NVIDIA-NeMo/Megatron-Bridge | |||
There was a problem hiding this comment.
Please clarify what has been "borrowed".
Please also paste the original copyright claim here if the code was not originally written by us.
| @@ -0,0 +1,202 @@ | |||
| # Copyright (c) 2025, BAAI. All rights reserved. | |||
There was a problem hiding this comment.
Looks to me that this file was largely adapted from flagscale/train/megatron/nemo_bridge/models/conversion/auto_bridge.py. We copy-pasted the source and we are claiming copyright for this code. This is not acceptable.
We can borrow code from other projects, provided that the license terms grant us this right. In that case, we still have to pay credit to the original authors. We are obliged to mention their copyrights.
There are some weird characters in this file which was obviously a character conversion problem during copy/paste. Please fix them as well.
|
收到,谢谢
|
Reconstruct the Nemo-Bridge based on the restructured flagscale version. Currently, flagscale has supported some functions of nemo-bridge, enabling the flagscale framework to load and save ckpt in the hf format during the training process. Additionally, in the current version, new features have been added, allowing for the setting of the number of iterations for saving hf weights based on the save_hf_interval. The model has verified that Deepseek V3 16_a3B, Qwen3-32B, and Qwen3-0.6B all have correct accuracy.